Protein identification with sequence tags
نویسندگان
چکیده
Genome sequences are available for increasing numbers of organisms. The proteomes (protein complement expressed by the genome) of some such organisms are being studied with two-dimensional gel electrophoresis, but the identification of thousands of proteins on two-dimensional gels remains a challenge. Recent progress with mass spectrometric and traditional sequencing methods has increased the speed, sensitivity, and ease of protein sequence analysis. Although these methods can be used to produce extensive sequence information, they are also ideal for rapidly generating aminoand carboxy-terminal ‘sequence tags’ of six amino acids or less. To investigate the application of such sequence tags to the identification of proteins separated on two-dimensional gels, we have written a program, TagIdent, to match a protein sequence of up to six amino acids against entries in the SWISS-PROT database. Important features of the program are that it allows the user to specify (optionally) the estimated isoelectric point and mass, one or more species of organism to match against, and whether the sequence data are aminoor carboxy-terminal; in this way searches are highly directed. This is in contrast to BLAST, BLITZ or FASTA, which are global searching tools that either cannot search with very small sequences or return lists containing many irrelevant proteins. TagIdent is available on the world-wide web at http://expasy.hcuge.ch/www/ tools.html and results are sent by e-mail. Use of TagIdent with proteins from organisms for which the genome has been completely, or almost completely sequenced shows that sequence tags have surprising specificity. Figure 1 shows that a protein from an Escherichia coli two-dimensional gel, sequenced with rapid Edman degradation for four cycles only [1], was identified from 223 other candidate proteins within the specified windows of isoelectric point (pI) and molecular mass. The identity of the protein was confirmed by using the same sample for amino-acid composition identification. The theoretical ‘identification’ of 50 randomly selected proteins from E. coli using sequence tags of three, four or five amino acids and appropriate pI and mass windows revealed the same trend. At the amino-terminus, 68% of proteins could be uniquely identified with a three amino-acid tag, 90% with four amino acids, and 94% with five amino acids. The remaining proteins were not uniquely identified, but were correctly assigned as members of a family. How accurate is the program, and how widely can it be applied? Accurate identification with Magazine 1543
منابع مشابه
همسانهسازی و بیان ایمونوتوکسین اونتاک به صورت هیبریدی با دنباله اینتئینی
Introduction: Inteins (INT) are internal parts of a number of proteins in yeast and some other unicellular eukaryotes, which can be separated from the immature protein during protein splicing process. After identifying the mechanism of intein action, applications of these sequences are be considered in the single- step purification of recombinant proteins and different intein tags were develope...
متن کاملComputational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)
Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...
متن کاملPepTiger: Search Engine for Error-Tolerant Protein Identification from de Novo Sequences
In recent years a number of de novo sequencing software products became available providing possible partial or complete amino acid sequence tags for MS/MS spectra of peptides. However, for a variety of reasons including spectral chemical noise and imperfect fragmentation these sequence tags almost always contain errors. Additional difficulties arise from actual protein sequence variation and p...
متن کاملافزایش سرعت شناسایی در سیستمهای RFID
Radio frequency identification (RFID) is a new generation of automatic identification systems, based on wireless communication technology. In these systems all the tags using one communication channel to communicate with the reader. When two or more tags transmit their data to the reader simultaneously, their transmitted signals will collide. Resolving this collision has a direct impact on the ...
متن کاملFibronectin type III domains in yeast detected by a hidden Markov model
relies on all proteins from an organism being in sequence databases. In this manner, if only one protein within a given pI and mass range is found with a certain aminoor carboxy-terminal sequence tag, one can be confident that there is no other, as yet undescribed, protein that could otherwise match the tag. In fully sequenced organisms, the procedure is thus self-checking. The specificity of s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Current Biology
دوره 6 شماره
صفحات -
تاریخ انتشار 1996